Time Series Regression

Lecture 6

How can past and current values of other variables help us forecast?

A linear regression model expresses the forecast variable as a linear function of predictors.

The multiple linear regression model:

y_t = β₀ + β₁x_1,t + β₂x_2,t + … + β_kx_k,t + ε_t

y_t is the forecast variable (dependent variable) at time t.

x_j,t are the predictor variables (independent variables).

β_j are the unknown coefficients, estimated from data.

ε_t is the error term — the part of y_t not explained by the predictors.

How are the regression coefficients estimated?

Ordinary Least Squares minimizes the sum of squared residuals.

OLS chooses β̂ to minimize:

SSR = ∑_t=1^T e_t² = ∑_t=1^T (y_t − β₀ − β₁x_1,t − … − β_kx_k,t)²

The solution (in matrix form) is:

β̂ = (X′X)⁻¹X′y

Interpretation: β̂_j is the estimated change in y associated with a one-unit increase in x_j, holding all other predictors constant.

The Gauss-Markov assumptions make OLS optimal.

1. Linearity — the true relationship is linear in the parameters.

Nonlinear relationships require transformations or nonlinear models.

2. No perfect multicollinearity — predictors are not exact linear combinations of each other.

Perfect multicollinearity makes (X′X) non-invertible and coefficients undefined.

3. Zero conditional mean — E[ε_t | X] = 0.

Requires exogeneity: no omitted variables correlated with X.

4. Homoskedasticity — Var(ε_t | X) = σ² (constant).

Violated when residual spread grows with the level of X or time.

5. No serial correlation — Cov(ε_t, ε_s) = 0 for t ≠ s.

Often violated in time series data — this is a key challenge.

Serial correlation in residuals is the biggest threat in time series regression.

If the error terms ε_t are correlated across time, OLS standard errors are wrong — usually too small. This leads to:

t-statistics that appear significant when they are not.
Confidence intervals that are too narrow.
Prediction intervals that understate true uncertainty.

Detection: plot the ACF of residuals; use the Breusch-Godfrey test (preferred over Durbin-Watson for multiple lags).

Fix: model the serial correlation explicitly by adding lagged dependent variables, ARIMA errors, or moving to a dynamic regression framework.

Heteroskedasticity means the error variance is not constant.

In time series, heteroskedasticity often appears as residuals that are larger during volatile periods (recessions, crises) than during stable ones.

Consequence: OLS is still unbiased, but standard errors are incorrect. Test statistics and prediction intervals are unreliable.

Detection: plot residuals vs. fitted values and vs. time. A “fanning out” pattern indicates heteroskedasticity.

Fixes:

Log-transform the dependent variable (if variance grows with level).
Use heteroskedasticity-consistent (HC) standard errors — also called “robust” standard errors.
In R: coeftest(fit, vcov = vcovHC(fit)) or TSLM() + robust SEs.

What is endogeneity, and why does it matter for forecasting?

Endogeneity occurs when a predictor is correlated with the error term.

This violates the zero conditional mean assumption and makes OLS estimates biased and inconsistent. Common sources:

Omitted variables — a variable correlated with both x and y is left out.
Reverse causality — y causes x as well as x causing y.
Measurement error in the predictor — the observed x differs from the true x.

In a forecasting context: endogeneity is less critical if the goal is prediction accuracy, not causal interpretation. A biased coefficient can still produce good forecasts if the correlation between x and y is stable. But it matters when you want to understand why a forecast is high or low.

Choosing useful predictors for time series regression

Trend — a time index captures linear growth or decline.

y_t = β₀ + β₁t + ε_t
Extend with t² for quadratic trends.

Dummy variables for seasonality.

One dummy per season minus one (to avoid perfect multicollinearity with the intercept).
Coefficient on “January dummy” = average January deviation from the baseline month.

Intervention variables for structural breaks.

A step dummy (0 before event, 1 after) captures a permanent level shift.
A spike dummy (1 only at time t₀) captures a one-off outlier.

Lagged predictors — when predictors affect y with a delay.

Advertising spend in month t−1 may predict sales in month t.

More predictors are not always better.

Adding predictors always improves in-sample fit (R² never decreases). But out-of-sample forecast accuracy can deteriorate with too many predictors — this is overfitting.

Information criteria penalize model complexity to select the right number of predictors:

AIC (Akaike): minimizing AIC favors models that forecast well. Asymptotically equivalent to leave-one-out cross-validation.
AICc: corrected AIC for small samples. Use instead of AIC when T/k < 40.
BIC: penalizes complexity more heavily than AIC; consistent for model selection.

In fpp3: glance(fit) |> select(AIC, AICc, BIC).

R² measures in-sample fit, not forecast accuracy.

R² = 1 − SSR/TSS is the fraction of variance explained by the model in the training data. It always increases when you add predictors, even useless ones.

Adjusted R² penalizes extra parameters: R̄² = 1 − (1−R²)(T−1)/(T−k−1). Better than R² for model comparison, but still an in-sample measure.

The right metric for forecasting is out-of-sample accuracy (MASE, RMSE on test data, or TSCV). A model with R² = 0.95 that cannot beat seasonal naïve on new data is useless for forecasting.

Nonlinear relationships can often be linearized through transformation.

Common linearizing transformations:

Model	Equation	Interpretation of β₁
Log-log	log y = β₀ + β₁ log x	Elasticity: 1% ↑ x ⇒ β₁% ↑ y
Log-linear	log y = β₀ + β₁ x	Semi-elasticity: 1 unit ↑ x ⇒ 100β₁% ↑ y
Linear-log	y = β₀ + β₁ log x	1% ↑ x ⇒ β₁/100 unit ↑ y

When forecasting from a log model, remember to back-transform with a bias correction: E[y] ≈ exp(fitted + ½σ̂²).

Producing forecasts from a regression model

Ex ante forecast — uses only information available at the forecast origin.

Future values of predictors must themselves be forecast (or be known in advance).
Example: forecasting electricity demand using forecast temperature.

Ex post forecast — uses actual future predictor values.

Only useful for model evaluation, not real-time forecasting.
Isolates the regression model’s contribution from predictor forecast errors.

Prediction intervals must account for two sources of uncertainty.

Uncertainty in the error term ε_T+h.
Uncertainty in the estimated coefficients β̂.
In practice: use fpp3’s forecast() which handles both automatically.

In fpp3, TSLM() fits time series linear models with convenient shorthand.

          # Trend + seasonality

          fit <- data |> model(TSLM(y ~ trend() + season()))

          # With external predictor

          fit <- data |> model(TSLM(y ~ x1 + x2 + trend()))

          # Forecast with new predictor values

          fit |> forecast(new_data = future_scenarios)

report(fit) gives coefficients, standard errors, t-statistics, and R². gg_tsresiduals(fit) checks assumptions.

Correlation is sufficient for forecasting; causation is required for policy.

A regression model can produce accurate forecasts even if the predictor-outcome relationship is not causal — as long as the correlation is stable and the predictor is available before the outcome.

Example: shoe sales in the prior month might correlate with consumer spending next month. Even if there is no causal story, the correlation is useful for forecasting — provided it persists.

Danger: spurious correlations (two series that both trend upward) produce high R² but zero out-of-sample predictive value. Always evaluate forecast accuracy on held-out data, regardless of how compelling the in-sample relationship looks.

Including the right controls prevents confounded estimates.

If a variable Z causes both X and Y and is omitted from the regression, the coefficient on X will absorb the effect of Z. The estimated β̂₁ is biased.

The direction of the bias follows the formula: Bias = γ · ρ(X, Z) where γ is the effect of Z on Y and ρ(X,Z) is the correlation between the omitted variable and the predictor.

In forecasting, omitted variable bias matters most when (1) you need to interpret coefficients, or (2) the omitted variable will change in the forecast period in a way that breaks the historical correlation.

Practice Questions

Question 1 of 4